Corpus-Based Rules for Czech Verb Discontinuous Constituents

نویسندگان

  • Eva Zácková
  • Karel Pala
چکیده

In this paper we present a method for extracting general structures of the verb groups from a tagged and fully disambiguated corpus and consecutive exploitation of these structures for the building a formal grammar in the Prolog DCG fashion. Our goal is to apply them as a rules for the analysis of the Czech verb groups in the nondisambiguated grammatically tagged Czech corpus texts. The problem of the recognition of verb discontinuous constituents in Czech is also approached and obtained statistical data are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition and Tagging of Compound Verb Groups in Czech

In Czech corpora compound verb groups are usually tagged in word-by-word manner. As a consequence, some of the morphological tags of particular components of the verb group lose their original meaning. We present a method for automatic recognition of compound verb groups in Czech. From an annotated corpus 126 definite clause grammar rules were constructed. These rules describe all compound verb...

متن کامل

Applying Licenser Rules to a Grammar with Continuous Constituents

Licenser rules have originally been introduced in Müller (1999) as a part of a grammar based on discontinuous constituents. We propose licenser rules as a means to avoid underspecified empty elements in grammars with continuous constituents. We applied them to a verb movement analysis of the German main clause with right sentence bracket and to complement extraposition. To reduce the number of ...

متن کامل

Non-projectivity and valency

We describe results of investigation of a specific type of discontinuous constructions, namely non-projective constructions concerning verbs and their arguments. This topic is especially important for languages with a relatively free word order, such as Czech, which is the language we have primarily worked with. For comparison, we have included some results for English. The corpora used for bot...

متن کامل

Continuous or Discontinuous Constituents ?

During the last years, several grammarians have argued for linguistic descriptions of language that use the con-has shown that in the worst case 2 n constituents can be built for an input string of length n if discontinuous constituents are allowed. As Carroll (1994) has demonstrated, such theoretical values are not of much help when it comes to practical systems. In the following I will compar...

متن کامل

Parsing String Generating Hypergraph Grammars

A string generating hypergraph grammar is a hyperedge replacement grammar where the resulting language consists of string graphs i.e. hypergraphs modeling strings. With the help of these grammars, string languages like anbncn can be modeled that can not be generated by context-free grammars for strings. They are well suited to model discontinuous constituents in natural languages, i.e. constitu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999